26 research outputs found
Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification
We introduce a family of unsupervised, domain-free, and (asymptotically)
model-independent algorithms based on the principles of algorithmic probability
and information theory designed to minimize the loss of algorithmic
information, including a lossless-compression-based lossy compression
algorithm. The methods can select and coarse-grain data in an
algorithmic-complexity fashion (without the use of popular compression
algorithms) by collapsing regions that may procedurally be regenerated from a
computable candidate model. We show that the method can preserve the salient
properties of objects and perform dimension reduction, denoising, feature
selection, and network sparsification. As validation case, we demonstrate that
the method preserves all the graph-theoretic indices measured on a well-known
set of synthetic and real-world networks of very different nature, ranging from
degree distribution and clustering coefficient to edge betweenness and degree
and eigenvector centralities, achieving equal or significantly better results
than other data reduction and some of the leading network sparsification
methods. The methods (InfoRank, MILS) can also be applied to applications such
as image segmentation based on algorithmic probability.Comment: 23 pages in double column including Appendix, online implementation
at http://complexitycalculator.com/MILS